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Abstract A standard approach to computing expectations with respect to a given target measure is 
to introduce an overdamped Langevin equation which is reversible with respect to the target distribu¬ 
tion, and to approximate the expectation by a time-averaging estimator. As has been noted in recent 
papers 3061 72 , introducing an appropriately chosen nonreversible component to the dynamics is 
beneficial, both in terms of reducing the asymptotic variance and of speeding up convergence to the tar¬ 
get distribution. In this paper we present a detailed study of the dependence of the asymptotic variance 
on the deviation from reversibility. Our theoretical findings are supported by numerical simulations. 


1 Introduction 

1.1 Motivation 

In various applications arising in statistical mechanics, biochemistry, data science and machine learning 
19,38 39 42 , it is often necessary to compute expectations 


7r(/):=E^/= / f{x)Tr{dx) 


of an observable / with respect to a target probability distribution TT{dx) on with density ^{x) with 
respect to the Lebesgue measure, known up to the normalization constanlQ When the dimension d is 
large, standard deterministic quadrature approaches become intractable, and one typically resorts to 
Markov-Chain Monte Carlo (MCMC) methods 19 39,63 . In this approach, 7r(/) is approximated by a 
long-time average of the form: 




f{Xt)dt, 


( 1 ) 


where Xt is a Markov process chosen to be ergodic with respect to the target distribution tt. The 
Birkhoff-von Neumann Ergodic theorem 31 34 60 states that, for every observable / G L^(jr) we have 


lim f f{X,)ds = TT{f), 


TT — a.e. Xn = X. 


( 2 ) 
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If TT possesses a smooth, strictly positive density, then a natural choice for Xt is the overdamped Langevin 
dynamics 

dXt = X\ogTr{Xt)dt + V2dWt, (3) 

where Wt is a standard Brownian motion on Assuming that ([^ possesses a unique strong solution 
which is non-explosive, the process Xt is ergodic, with unique invariant distribution tt, such that ([^ 
holds. Under additional assumptions on the distribution tt and on the observable, this convergence 
result is accompanied by a central limit theorem which characterizes the asymptotic distribution of the 
fluctuations of nrif) about 7r(/), i.e. 




( 4 ) 


where cr^ is known as the asymptotic variance for the observable /. For the reversible process ra started 
from stationarity (i.e. Xq ~ tt), the Kipnis-Varadhan theorem 12 32 implies that Q holds with asymp¬ 
totic variance 

aj = 2{f -E^f,i-S)-\f -E^f))^, 

where (•, •) 7 r is the inner product in U^(7r) and S is the infinitesimal generator of Xt defined by 


5 = V log7r(a;)V • +A. 


( 5 ) 


Sampling methods based on Langevin diffusions have become increasingly popular due to their wide 
applicability and relative ease of implementation. In practice, a discretisation of ([^ may be used, which 
in general, will not be ergodic with respect to the target distribution tt. Thus, the discretisation is aug¬ 
mented with a Metropolis-Hastings accept/reject step 65 which guarantees that the resulting Markov 


chain remains reversible with respect to tt. The resulting scheme is known as the Metropolis-Adjusted 
Langevin Algorithm (MALA), see [65]. 


The computational cost required to approximate 7r(/) using TTrif) to a given level of accuracy de¬ 
pends on the target distribution tt, the observable / and the process Xt over which the long-time average 
is generated. Quite often, Xt will exhibit some form of metastability [W II 36 66 : the process Xt will 


remain trapped for a long time exploring one mode, with transitions between different modes occurring 
over longer timescales. When the observable depends directly on the metastable degrees of freedom (i.e. 
the observable takes different values in different metastable regions), the asymptotic variance of the 


estimator TTrif) may be very large. As a result, more samples are required to obtain an estimate of 7r(/) 
with the desired accuracy. A similar scenario arises when the mass of tt is tightly concentrated along 
a low-dimensional submanifold of as illustrated in Figure]^ In this case, reversible dynamics, such 
as (j^ will cause a very slow exploration of the support of tt. As a result, 7rr(/) will exhibit very large 
variance for observables which vary strongly along the manifold. 


As there are infinitel}0 many Markov processes with invariant distribution tt, a natural question is 
whether such a process can be chosen to have optimal performance. The two standard optimality criteria 
that are commonly used are: 

(a) With respect to speeding up convergence to the target distribution. 

(b) With respect to minimizing the asymptotic variance. 


These criteria can be used in order to introduce a partial ordering in the set of Markov chains or diffu¬ 
sion processes that are ergodic with respect to a given probability distribution 50 59 . From a practical 
perspective, the definite optimality criterion is that of minimizing the computational cost. We address 
this issue (at least partially) later in this paper. 


Within the family of reversible samplers, much work has been done to derive samplers which exhibit 
good (if not optimal) computational performance. This has motivated a number of variants of MALA 
which exploit geometric features of tt to explore the state space more effectively, including preconditioned 
MALA [64] , Riemannian Manifold MALA |20] and Stochastic Newton Methods |4^ . 


^ Formally, all diffusion processes Xt with drift b{x) and diffnsion o-(x) and generator £ = b ■ V + ^crcr^ : VV 
such that TT is the unique solution of the stationary Fokker-Planck equation £*Tr — 0 can be used to sample from 

TT. 
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— Reversible 

— Nonreversible 



Fig. 1: Trajectories of a reversible MALA chain and a nonreversible Langevin sampler generating sam¬ 
ples from a warped Gaussian distribution. The mass of the distribution is contentrated along a one¬ 
dimensional submanifold. Reversible samplers, such as MALA, perform a very slow exploration of the 
distribution, spending large amounts of time concentrated around a small region of the submanifold. On 
the other hand, nonreversible samplers are able to perform long “jumps” along the level-curves of the 
distribution, thus able to explore the distribution far more effectively. 


1.2 Nonreversible Langevin Dynamics 


An MCMC scheme which departs from the assumption of reversible dynamics is Hamiltonian MCMC 53 , 


which has proved successful in Bayesian inference. By augmenting the state space with a momentum 
variable, proposals for the next step of the Markov chain are generated by following Hamiltonian dynam¬ 
ics over a large, fixed time interval. The resulting nonreversible chain is able to make distant proposals. 
Various methods have been proposed which are related to this general idea of breaking nonreversibility by 
introducing an additional dimension to the state space and introducing dynamics which explore the en¬ 
larged space while still preserving the equilibrium distribution. In particular, the lifting method [TsPtItT 


is one such method applied to discrete state systems, where the Markov chain is “lifted” from the state 
space fl onto the space fl x {1,-1}. The transition probabilities in each copy are modified to intro¬ 
duce transitions between the copies to preserve the invariant distribution but now promote the sampler 
to generate long strides or trajectories. Similar methods based on introducing nonreversibility into the 
dynamics of discrete state chains to speed up convergence have been applied with success in various ap¬ 
plications JBlMlIlllI ^^|69| . These methods are also reminiscent of parallel tempering or replica exchange 
MCMC 51 , which are aimed at efficiently sampling from multimodal distributions. 


It is well documented that breaking detailed balance, i.e. considering a nonreversible diffusion process 
that is ergodic with respect to tt, can help accelerate convergence to equilibrium. In fact, it has been 
proved that, among all diffusion processes with additive noise that are ergodic with respect to tt, the 
reversible dynamics has the slowest rate of conver genc e to equilibrium, measured in terms of the 

spectral gap of the generator in tt) =: L^(7r), c.f. 37 . Adding a drift to ([^ that is divergence-free 

with respect to tt and that preserves the invariant measure of the dynamics will always accelerate con¬ 
vergence to equilibrium 28 29 37 54 61,72 . The optimal nonreversible perturbation can be identified 


and obtained in an algorithmic manner for diffusions with linear drift, whose invariant distribution is 
Gaussian 37 ; see also 


72 


The effect of nonreversibility in the dynamics on the asymptotic variance has also been studied. In 61 it 
is shown that small antisymmetric perturbations of the reversible dynamics always decrease the asymp¬ 
totic variance, and more recently in 62 Friedlin-Wentzell theory is used to study the limit of infinitely 
strong antisymmetric perturbations. In 30 the authors use spectral theory for selfadjoint operators to 
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study the effect of antisymmetric perturbations on diffusions on both and on compact manifolds and 
provide a general comparison result between reversible and nonreversible diffusions. This work is related 
with previous studies on the behavior of the spectral gap of the generator when the strength of the 
nonreversible perturbation is increased 


13 18 


1.3 Objectives of this paper 


In this paper we present a detailed analytical and computational study of the effect of adding a nonre¬ 
versible drift to the reversible overdamped Langevin dynamics (j^ on the asymptotic variance dj. We 
will consider the nonreversible dynamics defined by [ 2 ^| 2 ^ : 


dX] = (V log^(X7) + a-i{X])) dt + V2dWt, ( 6 ) 

where the smooth vector field 7 is taken to be divergence-free with respect to the distribution tt, 

V • (yrr) = 0. (7) 

There are infinitely many vector fields that satisfy Q and a general formula can be derived using 
Poincare’s lemma 29 6^. We can, for example, construct such vector fields by taking 


7 {x) = JV log 7 r(a:), J =—J^ 


( 8 ) 


In (|^ we have already normalized the various vector fields and we have introduced a parameter a which 
measures the strength of the deviation from reversible dynamics. The generator of the diffusion process 
X2 can be decomposed into a symmetric and an antisymmetric part in L? (tt) , representing the reversible 
and irreversible parts of the dynamics, respectively 56 Sec. 4.6]: 


£ = 5 -I- aA, 


( 9 ) 


where S is given in ([^ and A = 7 • V. We are particularly interested in quantifying the reduction in 
the asymptotic variance obtained caused by breaking detailed balance. It is well known 32 33 that for 
square integrable observables, the asymptotic variance, denoted by crj(a) can be written in terms of the 
solution of the Poisson equation 

- (S + aA)r/> = f - E^f, (10) 


as 




( 11 ) 


Our first goal is to obtain an explicit formula for cr^(a) in terms of S and A. Moreover, through 
a spectral decomposition of the operator (—we investigate the dependence of the asymptotic 
variance on the strength of the nonreversible perturbation. Based on earlier work by Golden and Papan¬ 
icolaou 22 , Majda et al [Slliil and Bhattacharya et al ItIIs], in 58 , an expression is obtained for the 


effective self-diffusion coefficient for a tagged particle whose microscopic behaviour is determined by a 
nonreversible Markov process. The basic idea is the introduction of the operator 


g := {-S)-^A, 


( 12 ) 


which is skewadjoint in the weighted Sobolev space 

= {f G L^ijr) : 7r(/) = 0 and f)^ < 00 }, 

where (•, denotes the T^( 7 r) inner product; see Section]^ We follow a similar approach in this paper, 
from which a quite explicit formula for the asymptotic variance cr^{a) is derived using the spectral the¬ 
orem for selfadjoint operators. From this expression we can immediately deduce that the addition of a 
nonreversible perturbation reduces the asymptotic variance and we can also study the small and large 
a asymptotics of aj; see Theorem One of the results that we prove is that the large a behaviour of 
the asymptotic variance depends on the detailed properties of the vector field 7 : when the nullspace of 
the antisymmetric part of the generator M = 7 ■ V consists of only constants in then the asymp¬ 
totic variance converges to 0. Indeed, in this case, in the |a| —>■ 00 limit, the limiting dynamics become 
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deterministic, characterized by the Liouville operator 7 • V. On the other hand, when the nullspace of 
A contains nontrivial functions in there will exist observables for which the asymptotic variance a'j 
converges to a positive constant as |a| —>■ 00 . 

The effect of the antisymmetric part on the long time dynamics of diffusion processes has been also stud¬ 
ied extensively in the context of turbulent diffusion 43 and fluid mixing. The effect of an incompressible 


flow on the convergence of the solution to the advection-diffusion equation on a compact manifold to its 
mean value (i.e. when tt = 1) was first studied in 13 . In particular, the concept of a relaxation enhancing 


18 , where 


18 it is shown 


flow was introduced and it was shown that a divergence-free flow is relaxation enhancing if and only if the 
Liouville operator A = v has no eigenfunctions in the Sobolev space H^. An equivalent formulation 
of this result is that an incompressible flow is relaxation enhancing if and only if it is weakly mixing. 
Examples of relaxation enhancing flows are given in 13 . This problem was studied further in 
it also mentioned that there are very few examples of relaxation enhancing flows. In 
that the spectral gap of the advection-diffusion operator £ = au • V -I- 4\, for a divergence-free drift v 
and a S M, remains bounded above in the limit as a —)■ ±00 by a negative constant if and only if the 
advection operator has an eigenfunction in . The results are reminiscent of the results we mention 
above on the necessary and sufficient condition to obtain a reduction of asymptotic variance in the limit 
a ± 00 . 


Our analysis of the asymptotic variance based on the careful study of the Poisson equation (10), 
enables us to study in detail the problem of finding the nonreversible perturbation giving rise to min¬ 
imum asymptotic variance for diffusions with linear drift, i.e. diffusions whose invariant measure is 
Gaussian, over a large class of observables. Diffusions with linear drift were considered in [37| where the 
optimal nonreversible perturbation with respect to accelerating convergence was obtained. For linear and 
quadratic observables, we can give a complete solution to this problem, and construct nonreversible per¬ 
turbations that provide a dramatic reduction in asymptotic variance. Moreover, we demonstrate that the 
conditions under which the variance is reduced are very different from those of maximising the spectral 
gap discussed in 37 . In particular, we show how a nonreversible perturbation can dramatically reduce 


the asymptotic variance for the estimator even though no such improvement can be made on the 

rate of convergence to equilibrium. 

Guided by our theoretical results, we can then study numerically the reduction in the asymptotic variance 
due to the addition of a nonreversible drift for some toy models from molecular dynamics. In particular. 


we study the problem of computing expectations of observables with respect to a warped Gaussian 23 
in two dimensions, as well as a simple model for a dimer in a solvent 38 . The numerical experiments 


reported in this paper illustrate that a judicious choice of the nonreversible perturbation, dependent on 
the target distribution and the observable, can dramatically reduce the asymptotic variance. 

To compute nrif) numerically, we use an Euler-Maruyama discretisation of (|^. The resulting discreti¬ 
sation error introduces an additional bias in the estimator for 7 r(/), see 47 for a comprehensive error 
analysis. This imposes additional constraints on the magnitude of the nonreversible drift, since increas¬ 
ing a arbitrarily will give rise to a large discretisation error which must be controlled by taking smaller 
timesteps. A natural question is whether the increase in the computational cost due to the necessity of 
taking smaller timesteps negates any benefits of the resulting variance reduction. To study this problem, 
we compare the computational cost of the unadjusted nonreversible Langevin sampler with the corre¬ 
sponding MALA scheme}^ Our numerical results, together with the theoretical analysis for diffusions 
with linear drift, show that the nonreversible Langevin sampler can outperform the MALA algorithm, 
provided that the nonreversible perturbation is well-chosen. Finally, we consider a higher order numer¬ 
ical scheme for generating samples of (§, based on splitting the reversible and nonreversible dynamics. 
Numerically, we investigate the properties of this integrator, and demonstrate that its improved stability 
and discretisation error make it a good numerical scheme for computing the estimator T^xif) using a 
nonreversible diffusion. 

The rest of the paper is organized as follows. In Section we describe how the central limit theo¬ 
rem ([^ arises from the solution of the Poisson equation associated with the generator of the dynamics. 


^ As we illustrate in our paper, there is no point in considering the Metropolis adjusted sampler with a nonre¬ 
versible proposal, since the addition of the accept-reject step renders the resulting Markov chain reversible and 
any nonreversibility-induced variance reduction is lost. 
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In Section [^we analyse the asymptotic variance and formulate the problem of finding the optimal per¬ 
turbation with respect to minimising a'j, for a fixed observable, or over the space of square-integrable 
observables. Moreover, following the analysis described in 58 , we derive a spectral representation for the 
asymptotic variance a'j, in terms of the discrete spectrum of the operator Q defined in (12), and recover 
estimates for the asymptotic variance for any value of a. In Section]^ we consider the case of Gaussian 
diffusions, which are amenable to explicit calculation to demonstrate the theory presented in this paper. 
In Section we provide various numerical examples to complement the theoretical results. Finally, in 
Section]^ we describe the bias-variance tradeoff for nonreversible Langevin samplers, and explore their 
computational cost. Conclusions and discussion on further work are presented in Section 


2 The Central Limit Theorem and estimates on the asymptotic variance via the Poisson 
equation 

In this section we make explicit some sufficient conditions under which the estimator 


7'T(/) = ^ 


fiX])ds, 


where denotes the solution of § , satisfies a central limit theorem of the form (j^. The fundamental 
ingredient for proving such a central limit theorem is establishing the well-posedness of the Poisson 
equation 

- C(l>{x) = f{x) -Tr{f), TT{(j))=0, (13) 

for all bounded and continous functions / : —)■ M, where C is defined by (|^, and obtaining estimates 

on the growth of the unique solution cj). As described previously, we shall assume that tt possesses a 
smooth, strictly positive density, also denoted by 7r(x), such that 7r(x) dx < oo and that the SDE (lb 
has a unique strong solution for all a € M. Applying the results detailed in 
the process possesses a Lyapunov function, which is sufficient to ensure t 
of X^, 46 48 , as detailed in the following assumption. 


49 , we shall assume tha 


Assumption A (Foster—Lyapunov Criterion) There exists a function U : 
c > 0 and 6 G M such that tt(U) < oo and 

CU{x) < —cU{x) + blc, and U{x) > 1, x G 

where Ic is the indicator function over a petite set. 


le exponential ergodicity 
—> M and constants 
(14) 


For the definition of a petite set we refer the reader to 48 . For the generator C corresponding to the 
process ([^, compact sets are always petite. As noted in 65 , a sufficient condition on tt for (§ to possess 
a Lyapunov function is the following. 


Assumption B The density tt is hounded and for some 0 < /3 < 1; 

liminf ((1 — /3)|Vlog7r(a;)p -I- Z\log7r(x)) > 0. 

|a:|—>-oo 


(15) 


Lemma 1 Theorem 3.3] Suppose that Assumption \b\ holds then the Foster-Lyapunov criterion 

holds for with 

U{x) = 7T~^{x). 

Moreover, if the nonreversible term is of the form 7(2;) = JVlog7r(x), for J G antisymmetric, then 

this choice of U is a Lyapunov function for Xf defined by for all a G K. 

Proof For this choice of 7 the generator of § has the form 

C = {I + aJ)X log7r(a:) - X + A, a G M, 

For U{x) = 7T~^{x) we obtain: 

CU{x) = —j3TT~^{x)X log7r(a:) • (J -I- aJ)X log7r(a;) — / 3 V • (7r“^(a::)V log7r(2;)) 

(1 — / 3 ) |V log7r(a::)|^ -|- Z\log7r(2:) tt~^{x). 
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Thus, by assumption (15), there exists e > 0 and M such that for |a;| > M: 


(1 — f3) |Vlog7r(a;)|^ + Z\log7r(a;) > e. 


and so 

CU{x) < -l3eU{x) + blcM, 

where Cm = {a;GK‘^ : |a;|<M} and 6 is a positive constant. 


Finally, we note that since tt is bounded, then U is bounded above from zero uniformly. Thus, U can be 
rescaled to satisfy the condition ?7 > 1, as is required by the Foster-Lyapunov criterion. □ 


Remark 1 Note that Assumption [ b] holds trivially when tt is a Gaussian distribution in More gener¬ 
ally, following 64 Example 1], consider Tr(x) cx exp(—p(a;)), where p(x) is a polynomial of order m such 
that p(x) —>■ oo as |a:| —>■ oo (necessarily m > 2 and m is even). Clearly 


lim inf 

|ir|—>-<50 


|Vlog7r(a:)p 

|Z\log7r(a;)| 


= 00 , 


and 


lim inf (1 — /?) |V log7r(x)F > 0, 

|rc|—>-oo 


so that (14) holds. On the other hand, consider the case when when tt is a Student’s t-distribution, 
7r(x) oc (1 -I- x^ jv) 5^ with v >2. Then it is straightforward to check that (14) will not hold. Indeed, 
since |Vlog7r(x)| —0, by 65 Theorem 2.4] this process is not exponentially ergodic. 


If condition (14) holds for then the process will be exponentially ergodic. More specifically, the 
law of the process started from a point x € will converge exponentially fast in the total variation 
norm to the equilibrium distribution tt. In particular, denoting by {P^)t>Q the semigroup associated with 
the diffusion process (1^, we have the following result. 


Theorem 1 @ Thm 8.^ @ Thm 3.10, 3.12], ^ Thm 5.2.c] 

Suppose that Assumption\^ holds for with Lyapunov function U. Then there exist positive constants 


c and A such that, for all x. 


WpUx,-) - Tr\\j,y < cU(x)e 


-At 


where pj{x,-) denotes the law of Xf given Xj = x and 
particular, for any probability measure v such that U € 


denotes the total variation norm. In 


lim ||(P7)V-7r| 




TV 


= 0 , 


where {P]])* denotes the adjoint of P]]. 


□ 


For a central limit theorem to hold for the process Xf, and thus for cr^ to be finite, it is necessary 
that the Poisson equation (10) is well-posed. The Foster-Lyapunov condition (14) is sufficient for this to 
hold. 


Theorem 2 \21\ Thm 3.2] Suppose that Assumption\^holds for the diffusion process & with Lyapunov 
function U. Then there exists a positive constant c such that for any j/p < U, the Poisson equation 
admits a unique zero mean solution cj) satisfying the bound |(()(x)|^ < cU{x). In particular, cj) G L^(7r). 

□ 

The technique of using a Poisson equation to obtain a central limit theorem for an additive functional 
of a Markov process is widely known, [^l6 55 . The approach is based on the fact that, at least formally, 


V<f(Xf) ■ dWs 


we can decompose T^tif) ~ T^if) into a martingale and a “remainder” term: 

1 


Mf) - 7r(/) = 


t 




'0 


=:Rt 


t 

Mt. 


t 
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Considering the rescaling Vi{T^t{f) — the martingale term y/tMt will converge in distribution to 

a Gaussian random variable with mean 0 and variance 


tr? = 2 / \\/<f>{x)\'^ TT{dx), 


by the central limit theorem for martingales 25 . It remains to control the remainder term \/iRt. We 
distinguish between two cases: If ^ tt, then since (j) S ViRt converges to 0 in and the 

result follows. In the more general case we must resort to a “propagation of chaos” argument (c.f. 
Section 8 ]), i.e. apply Theorem to show that 


12 


Exo=:l 






f{X])-7T{f)ds 


-Exo. 


H\^jy{X])-TT{f)ds 


0 , 


as r — >■ (X), for all continuous bounded functions H. The result then follows by decomposing 

ITt+rU) - Af) = fix]) ds+^ f{X]) ds, 

and applying the propagation of chaos argument to the second term. The conclusion is summarized in 
the following result, which provides a central limit theorem for X] starting from an arbitrary initial 
distribution v. 


Theorem 3 \21\ Thm 4-4] If Assumption^^ holds for Lyapunov function U, then for any f such that 
P{x) < U{x), there exists a constant 0 < cry< oo such that \/i['Kt{f) — n{f)) converges in distribution 
to an A/'( 0 ,crp distribution, as t ^ oo, for any initial distribution v, where 


cr? = 2 / |V(/>(x)f 7 r(dx). (16) 

□ 

In the remainder of this paper we shall study the dependence of (f, and thus cr^ on the choice of non- 
reversible perturbation 7 . We note that (16) is precisely the Dirichlet form associated with the dynamics 


C evaluated at the solution cf of the Poisson equation (13). 


3 Analysis of the Asymptotic Variance 


3.1 Mathematical Setting 


We present a few definitions from 33 that will be useful in the sequel. Let Lq)^) denote the set of L]{'k)- 
integrable functions with zero mean, with corresponding inner product (•, ■)■„■ and norm Ij-H,,.. Consider 
the operator S given by ([^ densely defined on Lo('^)' For fc G N, given the family of seminorms 


l^= {f,i-s)^f). 


(17) 


we define the function spaces 

■.= {f&Ll{TT) : ||/|U<+oo}. 


It follows that ll'll^ is a norm on Rf. For fc > 1, the norm || • ||fc satisfies the parallelogram identity 
and, consequently, the completion of with respect to the norm || • |jfc, which is denoted by R^, is a 
Hilbert space. The inner product (•,-)fc in "H* is defined through polarization. It is easy to check that, 
for f, g£ V{C), 

{f,9)i = {f,{-S)g). = {Xf,X9).. (18) 


We note that R° = LKtt). A careful analysis of the function space R^ is presented in 33 


The operator S is symmetric with respect to tt and can be extended to a selfadjoint operator on 
which is also denoted by S, with domain 'DiS) = R^. We shall make the following assumption on tt 
which is required, in addition to Assumption [ b| to ensure that C possesses a spectral gap in 

Assumption C 

lim |Vlog 7 r(a;)| = 00 . (19) 

|a;|—»-+oo 
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In the following lemma we establish a number of fundamental properties relating to the spectrum 
of £. Using these results, we then establish the well-posedness of the Poisson equation (13) for any 
/ e Ll{n). 

Lemma 2 Suppose that Assumptions^^ anc?|^/ioW. Then the embedding C Lg( 7 r) is compact. For 
all a G M, the operator L satisfies the following Poincare inequality in Lq{tt): 


X\\g\\l < {g,{-C)g).^, gGH^, 


( 20 ) 


where X is a positive constant, independent of the nonreversible perturbation. Moreover, for all f G Lg( 7 r) 
there exists a unique G TL^ such that —Cfi = f. 


Proof Assumption clearly implies that 

— Z\log 7 r(x) < (1 — (5) |Vlog 7 r(x)|^ + Mi, x G 


( 21 ) 


for some S G (0,1) and Mi > 0. Applying Theorem 8.5.3], relations ( [T^ and (21) imply the 
compactness of the embedding TL^ C Lo(’’’)- ^ result, following Theorem 8.6.1], tf 

for the Poincare inequality to hold for S, i.e. 


ris is sufficient 


m\l<\\9\\l = {9,i-S)9). 


9^H- 


from which (20) follows by the antisymmetry of A in L‘^{'k). The existence of this Poincare inequality 
implies that the semigroup P^ associated with converges exponentially fast to equilibrium, that is. 


for all / G Lq (’’’)■ 


Given / G define 


\P^J 


< e 


-xt I 


IItt ’ — 


t > 0 . 


(/)(a;) = / P]f{x)da. 


( 22 ) 

(23) 


By (22) it follows that (j) G Lg( 7 r) and from (23) we have —Cfi = f. Moreover, we have the bound 


so that 


UWi < c\\f\\^ < 00 . 


□ 


Throughout this section, we shall assume that AssumptionsJ^andj^hold, and moreover, for simplicity 
we shall make the following additional assumption: 

Assumption D The nonreversible perturbation 7 is smooth and bounded in L°°. 


We believe that this assumption could be relaxed, see in particular 30 for more general assumptions 


under which the following results should hold. We stick here to a simple presentation. In practice, this 
assumption is not very stringent. Indeed, suppose that there exists a smooth function 1 /; : K —>■ M>o such 
that 

^(U(a:))|VU(x)| < 1, for all X G (24) 


Then 7 satisfies 0 since 


1 


V ■ (nj) = --V ■ {JVe-^fjiV)) 

= -^V • (JVe-^) fjiV) - 4 ■ jyVe-^ 

Zj Zj 

= 0 , 

using the fact that J is antisymmetric. Moreover, it is clear that j{x) = J\/V{x)'ijj{V{x)) is bounded 
and smooth, thus satisfying Assumption]^ In particular, if U is a nonnegative polynomial function then 
condition (24) is satisfied by choosing tjj to he a. smooth non-negative function with compact support. 


Likewise, it is easy to satisfy if V has compact level sets. 
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We note that this choice of flow 7 leaves the potential V invariant, i.e. V(zt) is constant for all < > 0, 
where it = j(zt)- Thus, for large |a|, the flow aj will result in rapid exploration of the level surfaces of 
F, but the motion of Xt between level surfaces is entirely due to the reversible dynamics of the process. 
In particular, for potentials with energy barriers, the transition time for to cross a barrier will still 
satisfy the same Arrenhius law as the corresponding reversible process. Other choices of flow 7 are pos¬ 
sible. For example, one could alternatively consider a skew-symmetric matrix function J{x) as detailed 


41 . The corresponding flow would then be defined by 


7 ( 3 :) = —J{x)XV{x) + X ■ J{x), X G 


(25) 


It is straightforward to check that V • ( 77 r) = 0, using the fact that J{x) is skew-symmetric. If addi¬ 
tionally, the matrix function J is smooth with bounded derivative and compact support, then 7 satisfies 
Assumption As detailed in one can further generalise this choice of dynamics by additionally 
introducing a space dependent diffusion tensor, and an appropriate correction of the drift to maintain 
ergodicity with respect to tt. We do not consider this choice of dynamics in this paper, noting that most 
of the presented results can be readily generalized to this scenario. 

Given that Assumption [P] holds, then we have 

< ll7llLoo(Rd) ||m||i and J j{x) ■ Vu{x)TT{dx) = - Jx ■ {-f{x) ■ TT{x))u{x)dx = 0 , 
by §, so that the operator A'H} ^ Lq{'k) defined by A = 7 • V is well defined. 


3.2 An Expression for the Asymptotic Variance 

The main objective of this paper is to study the effect of the nonreversible perturbati on 7 on the 
asymptotic variance (Tp with the aim of choosing 7 so that cr^ is minimized. Integrating (16) by parts 
we can express Uj in terms of the Dirichlet form for C as follows: 

= j ^(dx) 

= — 4>{x)X ■ {X 4>{x)'k{x)) dx 


= — J cj){x) {A(j){x) + X log 7 r(x) • X(j)[x)) 7r(x) dx 
= {(!>,{-'S)(I))7T 

where the last line follows from the fact that A is antisymmetric in To( 7 r). 


(26) 


Starting from (26) we can obtain a quite explicit characterisation of Indeed, given / G 
we can rewrite the last line of (26) as 


aj = 2 (/, (-£)-= 2 (/, [(-£)-'] ^ /)., 


(27) 


where denotes the symmetric part of the operator. Using the fact that —£ is invertible on Lq{tt) for 
all a G M by Lemmathe following result yields an expression for cr^ia) in terms of S, A and a. 

Lemma 3 Let C = S + a A, for a G M. Then we have 

[(-£)-!] ® = [-5 + a^A*{-S)-^A] , (28) 

where A* = — A denotes the adjoint of A in In particular, for all f G LqItt), 

a%a) = 2{f, {-C)-^f). < 2(/, {Sy^f). = a}{0). (29) 
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Proof Since —S is positive, the operator Q = (—5)“5 ; Lq{'k) can be defined using functional 

calculus. Consider the operator C := QA. We can write —S + a'^A*{—S)~^A = —S + a^C*C, which can 
be shown to be closed in Lo(^) with domain Moreover, this operator has nullspace {0}, so that the 
inverse is also densely defined on Lq{tt). To show that (28) holds, we expand the left hand side to get 

1 


= 2 2 [(^ + aQAQ)-^ + (J - aQAQ)-^] Q. 


Since 


(/ + aQAQ)-^ + (/ - aQAQ)~^ = (/ + aQMQ)”^ (/ - aQAQ)~^ (/ - aQAQ) 

+ (/ - aQAQ)-^ (/ + aQAQ)~^ (/ + aQAQ) 
= [I - a^QAQ^AQY^ 21, 

therefore 

[{-C)-^Y = Q [I - o^QAQ^AQY^ Q= [-5 + aM*(-5)-U]”\ 
as required. The inequality ( [2^ is then simply a consequence of the fact that 

+ -5 

where ^ denotes the partial ordering between selfadjoint operators on TQ(7r). 


□ 


Thus, the asymptotic variance is never inc rea sed by introducing a nonreversible perturbation, for all 
/ S Lg(7r). This had already been noted in 61 where an expression for the the asymptotic was derived 
as the curvature of the rate function of the empirical measure, and also in [30] using an approach similar 
to that above. Expression (|27| provides us with a formula for tr^ in terms of a symmetric quadratic form 
which is explicit in terms of A and S. 


3.3 Quantitative estimates for the Asymptotic Variance 


In this section we derive quantitative versions of (^28| and (29), using techniques developed in 58 


the analysis of the Green-Kubo formula, which is itself based on earlier work on the estimation of 
eddy diffusivity in turbulent diffusion |^[^ [44[[^ . 


for 

the 


Following the approach of 22 58 , to quantify the effect of the antisymmetric perturbation aA on the 
asymptotic variance, we define the operator Q = (—: Tif —>■ First we note that we can rewrite 

(28) as follows: 




and therefore, from (27) and (28): 




{f,[i-a^gr'f)^- 


(30) 


(31) 


From the boundedness of the nonreversible perturbation we have the following properties: 

Lemma 4 Suppose that Assumptionl^ holds, then the operator g = (—iS)“^A is skew-adjoint on 'Hf. 
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Proof For f,g G 

{Qf,9)i = {{-S)-^Af,{-S)g)^ = -{f,Ag)^ = -{f,{-S){-S)-Ug)^ = -{f,gg}i, 

so that Q is antisymmetric. Since Q is also bounded, it follows that Q is skewadjoint on PL^. □ 

Moreover, under appropriate assumptions on the target distribution tt, one can show that the operator 
Q is compact. 

Lemma 5 Suppose that Assumptions and^^hold, then the operator Q is compact on pA. 

Proof By Lemmathe embedding PL^ C Lq{tt) is compact, and it follows immediately that PL^ CpA is 
a compact embedding. Moreover, since 7 is bounded we have, 

11^/112 = {{-s)gf,{-s)gf). = {Af,Af). < ll/ll?, f&n\ 

from which the result follows. □ 


Extending pA to its complexification, there exists a compact selfadjoint operator P on pA such 
that g = iP. From the spectral theorem for compact selfadjoint operators 24, Theorem 6.21], the 
eigenfunctions of P form a complete orthonormal basis in 'H^, and the eigenvalues of P are real. We can 
partition the eigenfunctions into those spanning N Ker[r] = Ker[M] and those spanning We 
denote by 

{'^n}n=l, 2 , 3 ,... ’ {®"}ra=l, 2 , 3 ,... ’ 

the eigenvalues and corresponding eigenfunctions of the operator P restricted to For / G Lo(^) 
have the following unique decomposition for / = {—S)~^f in PL^: 


f = f^r + '^fnen, 


(32) 


where fj\f G Af and fn = (/,en)i. Consequently, using this spectral decomposition, we can rewrite (31) 
as 


a}{a) = 2(^f, f"^^. 

= 2 (/, [l-a\iPf]-'f)^ 
= 2(/, [l + a^PY'f)^ 


= 2 


In 


+ 00 


n—1 


1 


1 + a2A2 


(33) 


The conclusion of the above computation is summarized in the following result. 


Theorem 4 Suppose that Assumptions\^ [q ho ld. Let f G Lo(7r) and a G M, then the asymptotic 

variance cr'j{a) corresponding to Xf is given by {33). In particular, we obtain the following limiting 
values for the asymptotic variance: 


CK —>-0 


and. 


O'—>-±oo 


Ko) = 2 ||/||?, 

(34) 

= 2||/At|!t 

(35) 


□ 


From (33) we obtain the following bounds on the asymptotic variance 

+O0 


o'K®) “ 211/77111 + 2 


l/n 


+ 00 


n—1 


1 + a2A2 




n—1 


12 














The problem of choosing the optimal nonreversible perturbation in (§ to minimize the asymptotic 
variance over all observables in To('^) expressed as the following min-max problem: 

min max -=-——s- 

i&Am feLliTv) ll/ll^ 

where, for some constant M > 0, Am denotes the set of admissible nonreversible drifts, typically. 

Am = {7 e : V • ( 7 ^) = 0 and |j 7 ||^„„ < M} . 

This is equivalent to finding 7 € Am which solves this max-min problem 

max miner (—5 + ^*(—, (36) 

7GAm 


f)r 


where ^ = 7 • V- 


where (j[Ai] denotes the spectrum of the operator Ai on AQ( 7 r). It is important to make the distinction 
between this problem and that considered in [37| for finding the optimal spectral gap, namely finding 
7 S Am such that 

max minRe (ct (—5 — ^)). (37) 

7GAm- 


From (36) and (37) it is evident that the decrease in asymptotic variance, and the increase in spectral gap 
arising from a nonreversible perturbation are due to very different mechanisms. For the latter problem 
we see that the increase of spectral gap arises from the nonnormality of 5 + yl. In addition, since the 
operator ^*(—5)“^^ is nonnegative in a nonreversible perturbation cannot increase the variance. 
This is in contrast with the problem of maximising the speed of convergence to equilibrium, as was con¬ 
sidered in [37| , where increasing the strength of the nonreversible perturbation could result in a decrease 
of the speed of convergence. 


We note however that a nonreversible perturbation 7 which solves the min-max problem (36) need 
not be a good candidate for reducing the variance of nxif) for a given fixed observable / S Lq{-k). 
Indeed, unless M = {0} for all 5 G , there will always be an observable for which the nonreversible 
perturbation does not reduce the variance, since ii g G M is nonzero, / = {—S)g is nonzero and such 
that cr'jia) = cr'j{0) for any a. Thus, from a practical point of view, it makes more sense to consider the 
problem choosing 7 to minimise the asymptotic variance of T^Tif) for a particular observable. To this 
end, for a fixed / € ^^(Tr), we can identify two distinct cases: G in which case 


lim crf(a) = 0 , 

|a|—>-oo 


and (-5)-!/ in which case. 


|q| 


lim fTj(a) = 2 


fA 


> 0 . 


More generally, consider / G so that / — 7 r(/) G Lq(tt). Assuming we can increase a arbitrarily, 

and neglecting any computational issues arising from the resulting discretisation error (which will be 
discussed in Section]^, the problem of minimising the asymptotic variance of / reduces to finding 7 such 
that 

(-5)-i(/-4/))TA7, (38) 

holds in Checking this condition requires the solution of an elliptic boundary value problem, thus 
this condition is not of practical use. Nonetheless, we can derive some intuition from (38). Clearly, if we 
can choose 7 so that Af = { 0 } in the nonreversible perturbation will be optimal for all observables 
/ G T^( 7 r). In general, it might not be possible to hnd 7 that satisfies this condition. For the nonreversible 
drift considered in 1^, namely 7 = JVV, with = — J , A/” will always be nontrivial; indeed, in this 
case H o V G Af for all functions H such that H oV GT-L^ .In this case, it is always possible to choose 
an observable / such that 7 will not be optimal for /, in the sense that o'j{a) is nonzero in the limit 
a —>■ 00 . We also remark that these asymptotic results, in particular the distinction between these two 
cases is reminiscent of similar results that have been obtained in the context of turbulent diffusion 44p7 . 


An analogous classification of the asymptotic behaviour of the spectral gap of the operator in the 
limit of large a is considered in 18 on a compact manifold. Indeed, in [18[ Theorem 1] it is determined 
that the spectral gap is finite in the limit of a —>■ ±00 if and only if A has a nonconstant eigenfunction 
in 77^. However, one should note that, the asymptotic variance crj(a) may converge to 0 as a ±00 
even when the spectral gap is finite, see also 61 Example 2.9] for a counterexample. 
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3.4 A Two Dimensional Example 

In this section we present a simple example on Consider the problem of calculating 7r(/) with 
respect to the Gaussian distribution 7r(a;) = The reversible overdamped Langevin equation 

corresponding to tt is given by the OU process: 

dXt = -Xt dt + V2 dWf (39) 

We introduce an antisymmetric perturbation a'y{Xf) where 7 (x) is the flow given by 



The infinitesimal generator for the perturbed nonreversible dynamics is 

£ = 5 + aA, 


where 

Sf{x) = -X • Xf{x) + Af{x) and Af(x) = j(x) ■ Xf{x). 
In polar coordinates, this is given by 





dr fir, 0) + dr^rfir, 0) + adefir, 9) + 


fe, 0 ir, 9), 


ir,9) G K>o X ( 0 , 27 r). 


As |a| —>■ + 00 , we expect the deterministic flow to move increasingly along the level curves of the 
distribution 7r(a;). Given an observable /, any variance in TTrif) arising from the variation of / along the 
level curves should vanish as |q;| —)■ oo, leaving only the variance contributed by the variation of / between 
level curves. We make this precise with a particular example. Consider the observable f{xi,X 2 ) = 2xf, 
expressible in polar coordinates as 

/(r, 9) = 2r^ cos^(9) = (1 + cos(20)). 

Noting that 7r(/) = 2, it is straightforward to check that the Poisson equation —C(j) = f — 7r(/), has 
mean zero solution given by 


(j){r,9) 


/\ (cos(2d) — Q;sin(2d)) 


The asymptotic variance can be evaluated directly as follows 


a}=2{4>,i-C)<j>)^ 



(cos(26l) — Q;sin(20)) 

y 1 + a2 


(r^(l + cos(20)) — 2) re ^ dr d9 


Therefore, as |a| —>■ oo, the asymptotic variance converges to 4. We note that, in this case, 

/ = (-‘5)”^ if - 7r(/)) = + y (cos(26»)), 


where (r2/2 — l) is perpendicular to r2/2cos(2d) in . Since the nullspace of A consists of all 

^ II ' l|2 

functions which depend only on r, we have fj^ = r^/2 — 1. It follows that 2 /at = 4, so that 


lim ajia) = 2 

|a:|—>-oo 



2 

1 
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which agrees with the conclusions of Theorem El As |a| —>■ oo, the motion in the 6 direction is averaged 

2 ^ T 

out. Indeed, in this limit, 2 fji^ corresponds to the asymptotic variance of the observable rj dt, 
where is the following ID reversible process. 


rj H-^ dt + \/2dWt, ro > 0. 


drt = + 

For more general observables /(r, 9), we expect maximum variance reduction as |a| —)■ oo when / varies 
strongly with respect to 9. In the other extreme, we expect zero improvement when / depends only on r. 
This intuition is formalised in the following section, in particular in Proposition and the subsequent 
bound (IM). 


For more general potentials, using a flow field of the form ^{x) = JVlog7r(x), = — J, the mech¬ 
anism for reducing the asymptotic variance is analogous: the large antisymmetric drift gives rise to fast 
deterministic mixing along the level curves of the potential, while the reversible dynamics induce slow 
diffusive motion along the gradient of the potential. When the nullspace of A is trivial in the fast 
deterministic flow is ergodic, so that, for a large, the antisymmetric component will cause a rapid ex¬ 
ploration of the entire state space. Consequently, the asymptotic variance converges to 0 as a —> oo. On 
the other hand, if A has a nontrivial nullspace, the antisymmetric perturbation is no longer ergodic, and 
the state space can be decomposed into components such that the rapid flow behaves ergodically in each 
individual component. In the limit of large a, Xt becomes a fast-slow system, with rapid exploration 
within the ergodic components coupled to a slow diffusion between components. Very recently, Rey-Bellet 
and Spiliopoulos have applied Freidlin-Wenzell theory to rigorously analyse this case in the large a 
limit for a large class of potentials. 


4 Nonreversible Perturbations of Gaussian Diffusions 


For the case when the target distribution is Gaussian, the SDE ^ for is linear. In this case, we can 
obtain an explicit analytical expression for the asymptotic variance for a large class of observables /. 
Indeed, consider the nonsymmetric Ornstein-Uhlenbeck process in 

dX^ = -{I + aJ)X^ dt + V2 dWt , (40) 


where J is an antisymmetric matrix, a > 0, and Wt is a standard d-dimensional Brownian motion. 
The stationary distribution 7r(x) is Af{0,I), independent of a and J. Although this system does not fall 
under the framework of Theorem]^ we are still able to obtain analogous conditions for a reduction in the 
asymptotic variance. The objective of this section is to mirror the results for speeding up convergence 


to equilibrium of Xt that were derived in [371 
particular, following arguments similar to 1^ 


to the case of minimizing the asymptotic variance. In 
ection 4.2], an explicit formula for the asymptotic variance 


will be derived, from which an optimal J can be chosen, in a manner similar to 
the process (401, the optimal nonreversible perturbation obtained in 


37 


37] . We note that for 
does not provide any increase 
to the rate of convergence to equilibrium, since all eigenvalues of the covariance matrix of the Gaussian 
stationary distribution are the same. Nonetheless, in this section we show that for certain observables, 
the asymptotic variance of Trxif) can be dramatically decreased. 


4.1 Explicit formula for the asymptotic variance 

We shall assume that the observable / is a quadratic functional of the form 

f{x) = X • Mx + I ■ X + k, 


(41) 


where M £ is a symmetric positive definite matrix, I £ and fc is a constant, chosen so that f{x) 

is centered with respect to 7r(x): k = — TrM. Gonsider the Poisson equation (10): 


— C(j){x) = X ■ Mx + I ■ X — TrM, Tr{4>) = 0, 


(42) 


where C is the infinitesimal generator of (40) given by £ = —{I -I- aJ)x • V -|- Z\. Eor such observables 
we can solve the Poisson equation (42) analytically and obtain a closed-form formula for the asymptotic 
variance for the observable /. 
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Proposition 1 Let A = {I + aJ)~^ = {I — aJ). The unique mean zero solution of the Poisson equation 
is given by 

4>{x) = X ■ Cx -\- D ■ X — Tr(C'), 

where 

1 l•00 

c=- 


(43) 


and 


D = A-^l. 


Moreover, the asymptotic variance is given by 

nOO 

a}{a) = \\M\\l- / ds + 2r (/ + aV^J)-iZ, 

Jo 

where ||^||^ = •s/Tr[AA^\ denotes the Frobenius norm of A, and i?] = AB — BA is the commutator 
of A and B. In particular, 

a}{a) < aj{0), Va S M. (44) 

Proof Clearly 4>{x) must also be a quadratic function of x, so we make the ansatz 

(j){x) = X ■ Cx -\- D ■ x — TiC. 


Plugging this into (421 we obtain 

X ■ A ((C + C^)x + B) — Tr(C' + C^) = x ■ Mx + C a; — TrM. 


Comparing equal powers of x we have 


X ■ A{C + C^)x = x ■ Mx, 
AD■X = I■X 

TtC = ^ TrM 
2 


(45a) 

(45b) 

(45c) 


for all X Since x is arbitrary, it follows that 

D = A-H. 

Equation (45a) is a Lyapunov equation, which is well-posed as M is positive definite and spec(A) C {A € 
C : ReA > 0}. Indeed, for C given by 

1 

C=- e-^^Me-^^Us, 

2 Jo 

both (45aI and (45c) are satisfied. The asymptotic variance cr‘j{a) is then given by (see (26)), 

-a^(a) = {4>{x), /(a:))^ = (x • Cx + D- x — Ti:C,x-Mx + l- x — Tr M)^ 

= {x ■ Cx, X ■ Mx).n + {D ■ X,l ■ x)t^ 

- (TrC',x- Mx)^ - (x- Cx, TrM)^ -f TrCTrM. 

Using the fact that C and M are symmetric, 

(x • Cx,x • Mx)t, = CijMkiK^ [xiXjXkXi] = TrC TrM -1- 2 Tr(CM^). 

Therefore, 


(46) 


-cr?(a) =2 Tr(CM^) + {D ■ x,l ■ x), 
2 ■' 

=2Tt{CM^) + D-1 


(47) 


POO 

/ 


Jo 

• 


ds + PA-H. 
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Since A = I — aJ, where J is antisymmetric, we have that, for all I G 


I ■ A-H =-l ■ [(/ - aJ)-i + (/ + aJ)-^] I 

= h ■ [(/ - aJ)-^{I + aJ)-^{I + aJ) + (/ + aJ)-^{I - aJ)-^{I - aJ)] I 
=l ■ {I + 

Therefore, we obtain the following expression for aj:{a): 

nOO 

<jj{a) = 2 e-2* Tr ds + 21 ■ {I + 

Jo 

Writing the first term as follows: 

pOO -| pOO 

/ e-2"Tr[e“'^"Me-““'*M^]ds =-||M||| - - / || [M, ||^ ds. 

Jo 2 2 Jo 


(48) 


From this, and the facts that J' J >0 and [M,I] = 0, the inequality (441 follows. 


□ 


Since J is skew-symmetric, the matrix exponential is a rotation matrix. Thus, the matrix 

^ajsM q-^Js same eigenvalues as M and . From III.6.14] we have 

A^(M) • A^(M) = A^(e“'^®Me"“‘^") • A^(M^) < Tr , 

where A^(M) and X^{M) denote the vectors of eigenvalues of M, sorted in ascending and descending 
order, respectively. In particular. 


rx> ^ 

/ Tr ds > -X^{M) ■ A'^(M). 

Jo 2 

On the other hand, let Af = ^[J] = N'lJ^ J], and diagonalising J we obtain 


(49) 


J = DU, where D = 


0 

0 

0 

q:A 


, for A = diag(Ai,..., Ad-Af). 


where U is a d x d orthogonal matrix with the first N columns spanning Af. Then 

0 


I - {I + a^J^J)-H = Ul 


(l + aA)- 


TTi Q:—>-±00 tti 

Ul -)■ Ul 


I 

o' 

0 

0 


Ui = \w\\ 


where V is the projection of l onto Af. Thus, for quadratic observables, from (48), the asymptotic 
variance has the following lower bound in the limit of large a: 


lim CT?(a) > a) := A^(M) • A^(M) 2 \lj^\" 

rv—•' '' 


(50) 


In particular, for diffusions with linear drift, the asymptotic variance cannot be decreased arbitrarily. The 
worst case scenario is when f{x) = m — d^ , for some m > 0, for which the asymptotic variance will 
not be decreased, for any antisymmetric matrix J and a G K, analogous to the situation which occurs 


37 for maximising the spectral gap, when all the eigenvalues of the covariance matrix are equal. 
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4.2 Finding the Optimal Perturbation 
We first focus on the case when the observable is linear, i.e. f{x) = I ■ x, similar to that considered 


58 


which 


Section 4.2], so that aj = 2l-{I + a'^J^J) Suppose that we fix a G M, and wish to choose J 
minimizes the asymptotic variance subject to || J||^ = 1. In this case, we have the equality 

liin = 

a—>-±oo ■' 

which is optimal if I is orthogonal to Af. Thus, the best we can do is to choose J such that I is an 
eigenvector of J^J with maximal eigenvalue. This can be done by choosing a unit vector oj G 
orthogonal to I and setting 

0 w — w 0 


J = 


V2 


where I = l/\l\ Then, 


rj = -{-L 


UJ +UJ^tj (j, UJ — UJ l) = 


l^l +UJ^UJ 


is a projector onto {Z,a;} and 
basis. In this case 


= 1, where || j|p. is the Frobenius norm with respect to the Euclidean 

^ ia2 

2 7 T 7\-l7 


aj{a) = 2l-il + a^j' 

•’ 2 + 0!^ 


We note in addition that, when minimising the asymptotic variance, there are many antisymmetric 
matrices J which give the minimal asymptotic variance. As an example, let d = 3, consider the observables 
f(x) = • X for Z^^) = (0, l,l)^/\/2 and Z^^^ = (1,0, l)^/-\/2 and Z^^^ = (1, — 1, l)^/-\/3, respectively. 

We choose J to be 

'0 11^ 


In this case 


!(«) = 


6 


J = ^ I -1 0 1 

^ 1-1 -1 0 , 


-Z • I —a^ 6 + a'^ — 

' a 2 _„2 g + 


(51) 


Z. 


The decay of CTj(a) as a 


00 is strongly dependent on the nullspace of J, given by 
A/” = span[C], where C = (1,—1,1)- 


In Figure 2a we plot the asymptotic variance for these observables. For f{x) = Z^^^ -x, see that as a —)■ 00 
the asymptotic variance converges to 0. This is due to the fact that Z^^^ is perpendicular to the nullspace 
of J. Thus, for this observable the matrix J given by (511 is an optimal perturbation. For /(x) = Z^^^ • x, 
since /(2) 

is not orthogonal to C, as a —)■ 00 , 


a} (a) 


Finally, for /(x) = Z^^) • X we observe that the asymptotic variance remains constant at 2, since Z*-^^ G AC, 
the nonreversible perturbation has no effect on this observable. 


We now focus on the case when Z = 0 so that /(x ) = x • Mx — Tr(M). In this case, it is not clear how 
to construct an optimal J, however the bound (50) suggests a good candidate for J. Suppose that M 


has eigenvalues Ai < A 2 < ... < with corresponding eigenvectors ei,..., e^. Suppose that d is even. 
Let 

■ • ■ 5 jd/2\ ^ 

be a partition of 1,... cZ into disjoint pairs. Define the antisymmetric matrix J by 


d /2 

7 = ^, 


^Jk 


^Jk 


k^l 




(52) 
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(a) Linear observable (b) Quadratic observable 

Fig. 2: The asymptotic variance for a linear diffusion Xt given in ( [40| for linear observables f{x) = -x, 

z = 1, 2, 3 (left), and for a quadratic observable f{x) = x ■ Mix (right). 


In this case can be decomposed into a product of rotations between the pairs of eigenvectors: 


d/2 

fc=l 

where Rv,w{0) is an anticlockwise rotation of angle 6 in the {f,w} plane. Then 

d/2 ^ d/2 

4 


/■OO 1 - 

I ds ^(A,, - A, + - ^(A,, + A.J^. 


fc = l 




For this choice of J, the asymptotic variance is given by 

d/2 


lirn a){a) = - 

a—>-±(x> •' / ^^ 




Arguing by contradiction, this is minimized by choosing iu = k, and jk = (d — A: + 1) for fc = l,...,d/2, 
in which case 

liin cr^(a) =- [(Ai + XdY + (A 2 + A^-i)^ + ... + {\d /2 + Ad/ 2 + 1 )^] 

a—>-±oo '’Z / I j 

k^l 

Clearly, 

XHm) ■ A^(M) < ^XHM) ■ X^M) + 1^X1 < Tr(M2), 

fe=i 

however these bounds are only tight when M = m I, for m > 0. 


As an example, on Figure 2b we plot the asymptotic variance for a quadratic observable x ■ Mix for 
a linear diffusion (40), where d = 4. The matrix Mi is given by 


Ml 


( 3/2 -1/2 0 0 \ 

-1/2 3/2 0 0 1 

0 0 7/2 -1/2 ’ 

Vo 0 -1/2 7/2 / 
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with eigenvalues Xi = i, for i = 1,.. .4. When a = 0, the asymptotic variance is < 7 ^( 0 ) = 30. The lower 
bound (441 is given by = 2 (A 1 A 4 + A 2 A 3 ) = 20. We choose the antisymmetric matrix J : as in 

( 1 ^, that IS 



/ 0 0 1 0 \ 
0 00-1] 
-10 0 0 
\ 0 1 0 0 / 


The resulting asymptotic variance crj(a) is plotted as 
\ ((Ai + A 4 )^ + (A 2 + As)^) = 25 as cr —y 00 . 


a function of a in Figure 2a which converges to 


5 Numerical Experiments 

In this section we present three numerical examples illustrating the effects of the antisymmetric pertur¬ 
bation on the asymptotic variance We will consider target measures defined by Gibbs distributions 
of the form: 

7r(x) = (53) 

where V(x) is a given smooth, confining potential with finite unknown normalisation constant Z. We 
will also assume that the divergence-free vector field 7 ( 0 :) is given by 

jix) =-aJVV(x), J = -J^, (54) 

so that the drift of ^ will be given by 

b{x) = -{(31 + aJ)WV{x). 


The symmetric and antisymmetric parts of the generator in L'^{T‘^;Tr{x)dx) are, respectively, 

S = -I3VV -V + A, A = JVV-V. 

Provided that V G H^{tt) the nullspace of the antisymmetric part of the generator is always nonempty. 
Indeed, the antisymmetry of A implies that AV = 0. Thus, for any observable / such that 7 r(/) = 0 and 

since (/, P)i = ((— 5 )“^/, (—5)P),r = the conclusion of Theorem 0 implies the asymptotic 

variance cr'j{a) will converge to a nonzero constant as a —> 00 . 


5.1 Periodic Distribution 


In the first example we consider a two-dimensional target density given by (53) where the periodic 
potential V(x) is given by 


V{x) = sin(27ra:i) cos(27ra:2), x = (a:i,a: 2 ) € T^, 
and = 0.1. Our objective is to calculate the expectation 7 r(/) of the observable 

f{x) = 1 -I- 4 sin( 47 ra:i)^ -I- 4 cos( 47 ra; 2 )^. 


(55) 


(56) 


In Figure 3a we plot the asymptotic variance over a G [0,12]. The stepsize At for the Euler-Maruyama 
discretisation of (§ is chosen to be 10 For each a, M = 10^ independent realisations of the Markov 
process are run, starting from Xq = (0,0), each for T = 10® time units to ensure that the process is 
close to stationarity. We observe that increasing a from 0 to 10 decreases the asymptotic variance by 
approximately two orders of magnitude. In Figure [3b| we plot the value of the estimator along with the 
confidence intervals estimated over 10® independent realisations. For this particular example, it appears 
that increasing the magnitude of the nonreversible perturbation does not appear to give rise to any 
noticeable increase in bias, while it significantly decreases the asymptotic variance, giving rise to a 
dramatic increase in performance of the estimator TTT{f)- 
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(a) Asymptotic Variance o-‘f(a) 



a 


(b) Estimator TTrif) 


Fig. 3: The asymptotic variance and value of the estimator TTrif), where T = 10®, for the target Gibbs 
distribution with potential defined by (55) and observable as in (56). The plot was generated from 10® 
independent realisations of an Euler-Maruyama discretisation of the nonreversible diffusion § , with 
stepsize 10“® over T = 10® time-units. The shaded region in Figure 3b indicates the 95%-confidence 
interval of the estimator for the given a. 


5.2 Warped Gaussian Distribution 


As a second numerical example we consider computing an observable with respect to a two-dimensional 
warped Gaussian distribution 


23 , defined by (53) with 


V{x) = ^ + {x2 + bxl-10Qbf 


(57) 


The parameter 6 > 0 is chosen to be 6 = 0.05. The potental V{x) is plotted in Figure]^ Our objective 
is to compute 7r(/) where the observable / is given by: 


f{x) = xf +xl. 


The nonreversible perturbation is chosen to be: 

"f{x) = —JW{x), where ^ 

In Figure we plot the asymptotic variance of the estimator TTrif), approximated from 10® independent 
realisations of the process, each run for 10® timesteps of size 10“® so that the total time is T = 10®. Each 
realisation was started at (0,0). As expected, we observe a significant decrease in asymptotic variance 
as the magnitude of the nonreversible perturbation is increased. Increasing a from 0 to 10 gives rise to a 
decrease in variance by a factor of 80. In Figure we plot the value of the estimator ttt averaged over 
the 10® realiations, with corresponding confidence intervals. As a is increased, the bias arising from the 
discretisation error also increases. To mitigate this bias one could decrease the stepsize, thus requiring 
more steps to generate nT{f) or otherwise introduce a Metropolis-Hastings accept-reject step, which will 
be discussed in section [5731 The tradeoff between bias and variance is considered in more detail in Section 

El 


5.3 Introducing a Metropolis-Hastings accept-reject step 

When sampling from a distribution tt it is natural to introduce a Metropolis-Hastings (MH) accept- 
reject step, rather than use the Markov chain obtained directly from an Euler-Maruyama discretization 
of the SDE Given a current state a next state is proposed according to the Euler-Maruyama 

discretisation of §: 

l~A7(x(”)-^Z\tVlog7r(x(”)),Z\t), (59) 
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Fig. 4: Contour plot of the warped Gaussian distribution with potential (57). 



(a) Asymptotic Variance o'}(a) (b) Estimator nrif) 


Fig. 5: Plot of the asymptotic variance cr'j{a) and the estimator for the warped Gaussian distri¬ 

bution TT defined by (57) and the observable f{x) = \x\^ for different values of a. The solid line was 
generated from 10^ independent realisations of an Euler-Maruyama discretisation of t he nonreversible 
diffusion with stepsize 10 ^ over T = 10 ® time-units. The shaded region in Figure [S^ indicates the 
confidence interval of the estimator for the given a. The dashed lined denotes the variance and mean 


when the chain is augmented with a accept/reject step, see Section 5.3 


The proposed state is then accepted (i.e. := X) with probability 


= 1 A 


7T{X)piX(^\X) 

7r(X("))p(X,A:("))’ 


where p(-, x) is the proposal density corresponding to ( |59[ ). By introducing this accept-reject mechanism, 
the resulting chain is guaranteed to have a stationary distribution which is exactly tt, independently 
of At. Thus, introducing a Metropolis-Hastings accept-reject step allows for far larger stepsizes to be 


22 






















used, while still preserving the correct invariant distribution of the chain, eliminating any bias arising 
from the discretisation of the SDE, making it beneficial both in terms of computational performance and 
stability. 

A natural question is whether an MH chain using a proposal distribution based on the SDE ([^ with 
antisymmetric drift will inherit the superior mixing properties of the nonreversible diffusion process. As 
the MH algorithm works by enforcing the detailed balance of the chain with respect to the distri¬ 
bution TT, we expect that any benefits of the antisymmetric drift term will be negated when introducing 
this accept-reject step. To test this, we repeat the numerical experiment of Section |5.2| using the MH 
algorithm using the Euler discretisation of § as a proposal scheme, for various values of a. The effect 
of introducing this accept-reject step to the nonreversible diffusion is evident from Figu re While the 
accept-reject step removes any bias due to discretisation error, as is evident from Figure [5b|the asymp¬ 
totic variance actually increases as a increases. This is due to the fact that for large a, proposals are 
more likely to be rejected as they are far away from the current state. 


5.4 Dimer in a Solvent 


We now test the effect of adding a nonreversible perturbation to a test model from molecular dynamics, 
as described in 38 Section 1.3.2.4]. We consider a system composed of N particles Pi,...,Pjv in a 
two-dimensional periodic box of side length L. Particles Pi and P 2 are assumed to form a dimer pair, 
in a solvent comprising the particles P 3 ,..., Pn■ The solvent particles interact through a truncated 
Lennard-Jones potential: 


VwcA{r) 


|4. 


lo 



if r < ro, 
if r > ro, 


where r is the distance between two particles, e and a are two positive parameters, and rg = 2 s a. The 
interaction potential between the dimer pair is given by a double-well potential 


Vs{t) = h 



{r — tq — wY 




(60) 


where h and w are two positive parameters. The total energy of the system is given by 

V{q) = Vs{\qi - q 2 \) + ^ VwCA{\qi - Qjl) + ywCAilQi - Qjl), (61) 

3<i<j<N i=l,2 3<j<N 


with q = (gi,... ,gAr) G (LT)'^^, where qi denotes the position of particle Pi. The potential Vs has two 
energy minima at r = rg (corresponding to the compact state), and at r = rg -|- 2 w (corresponding to 
the stretched state). The energy barrier separating the two states is h. Define ^(g) to be 


aq) = 


\qi - q 2 \ - rp 

2 w 


This reaction coordinate describes the transition from the compact state, ^(g) = 0 to the stretched 
state ^(g) = 1. The standard overdamped reversible dynamics to sample from the stationary distribution 
7r(g) (X exp(—/3H(g)) is given by 


dqt = -XV{qt) dt + ^/W^dWu 

where jd is the inverse temperature. Our objective is to compute the average reaction coordinate E 7 r^(g). 
We introduce an antisymmetric drift term as follows 

dqt = -[I + aJ)XV{qt)dt+^/ 2 ^dWt, (62) 
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where J € ]^ 2 Afx 2 Ar jg antisymmetric matrix. We consider two types of nonreversible perturbations, 
determined by antisymmetric matrices, Ji and J 2 . The first matrix Ji is the block-circulant matrix 
defined by: 


Ji = 


/ O 2 h ■■ ■ 

—I2 O2 I2 
: —I2 O2 


O 2 


-h \ 
O 2 


O2 ' ■ ' ■ I2 

\ I2 O2 ■ ■ ■ —I2 O2, / 


where I 2 and O 2 denote the 2x2 identity and zero matrix, respectively. The second matrix is given by 


/i? 04 ■ 

• 04 \ 

04 04 . 

. O4 

V04 04 ■ 

. oj 


where R is the following rotation matrix on 

0 1 0 \ 

0 0 1 I 
0 0 0 r 
-10 0/ 

and O 4 is the 4x4 zero matrix. The effect, in this case, is to apply the antisymmetric transformation 
only on the first two coordinates, which correspond to the positions of the particles composing the dimer. 


0 

-1 
\ 0 


Using an Euler-Maruyama discretisation of (62), samples are generated for = 16 particles, with 
parameter values /? = 1.0, a = 1.0, e = 1.0, h = 1.0, w = 0, and a S {0,2,4, 6 , 8 ,10}. For each value 
of Of, a single realisation, starting from 0 is simulated for 10^° timesteps of size At = 10“®, so that 
the total running time is T = 10®. The asymptotic variance is approximated from a single realisation 
of the process using a batch-means estimator (see IV.5]). In Figure we plot the value of the gen¬ 
erated estimator, along with the asymptotic variance for different values of a and for Ji and J 2 . The 
first perturbation provides marginally lower asymptotic variance for this particular observable, giving 
rise to a 66 % decrease, as opposed to a 50% decrease for J 2 , increasing a from 0 to 10. This decrease 
in asymptotic variance, comes at the cost of increased computation time due to having to reduce the 
stepsize to compensate for the discretisation error and numerical instability caused by the large drift. 
While this may appear as a negative result, since the antisymmetric drift terms were chosen arbitrarily, 
no claim is made about optimality. An important issue is whether the increase in computational cost 
outweighs the benefits. This issue will be studied more carefully in Section 


6 The Computational Cost of Nonreversible Langevin Samplers 

As observed Section while increasing a is guaranteed to decrease the asymptotic variance uj{a) for 
the estimator TTrif)-, this will also give rise to an increase in the discretisation error arising from the 
particular discretisation being used. Moreover, as a increases, the SDE becomes more and more stiff, 
to the extent that the discretisation becomes numerically unstable unless the stepsize is chosen to be 
accordingly small. As a result, any discretisation will require smaller timesteps to guarantee 

that the stationary distribution of is sufficiently close to 7 r(a:). This tradeoff between computa¬ 
tional cost and asymptotic variance of the estimator must be taken into consideration when comparing 
reversible to nonreversible diffusions. 


A rigorous error analysis of the long time average estimator was carried out in 47 . In this paper careful 
estimates of the mean square error 


Err^,zit(/) := E 


1 ^ 

-^/(A("))-7r(/) 
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Fig. 6: Value of the estimator ttt{0 for T = 10® with corresponding 95%-confidence intervals, for the 
dimer model, generated from a single realisation of (62). The right plot shows the asymptotic variance of 
the estimator over varying a, with antisymmetric drifts defined by JiS/V{x) and J 2 ^V{x), respectively. 


were derived for discretisations of a general overdamped Langevin diffusion on the unit torus see 
also 70 . In particular, in 47 Theorem 5.2] it is shown that the mean squared error can be bounded as 
follows 


Errj,^^^[f]<C{At^ + 


1 


NAt 


(63) 


where C is a positive constant independent of At and N, which depends on the coefficients of the 
SDE and the observable /. This estimate makes explicit the tradeoff between discretisation error and 
sampling error. For a fixed computational budget iV, the right hand side of (631 is minimized when 
At oc For an SDE of the form (|^, we expect that the constant C will increase with a. Iden¬ 

tifying the correct scaling of the error with respect to a is an interesting problem that we intend to study. 

To obtain a clearer idea of the bias variance tradeoff we compute the mean-square error for the Euler- 
Maruyama discretisatio n fo r two particular examples. In Figure |7a[ we consider the warped Gaussian 
distribution defined by ( [^ and the observable f{x) = \x\^. A value for 7r(/) is obtained by integrating 
/gd f{x)'x{dx) numerically, using a globally adaptive quadrature scheme to obtain an approximation with 

error less than 10“^^. In Figure 7a we plot the relative mean-squared-error defined by (Err^v.zit[/]/’’’(/))^ 
for an Euler-Maruyama discretisation of (|^, for timestep At in the interval [2“®, 1]. The total number 
of timesteps is kept fixed at iV = 10®. For each value of a, the mean square error is approximated over 
an ensemble of 256 independent realisations. Missing points indicate Hnite time blowup of the discretized 
diffusion. The dashed line denotes the MSE generated from the corresponding MALA sampler, namely 
an Euler-Maruyama discretisation of the reversible diffusion with an added Metropolis-Hastings accept- 
reject step. We note that both the Euler-Maruyama discretisation and the MALA sampler require one 
evaluation of the gradient term VlogTr per timestep, so that comparing an ergodic average obtained 
from 10® steps of each scheme is fair. 

A trade-off between discretisation error and variance is evident from Figure [7a| and is consistent with 
the error estimate ( |63| ). We observe that the nonreversible Langevin sampler outperforms the reversible 
Langevin sampler by an order of magnitude, with the lowest MSE attained when a = 10. As a is in¬ 
creased beyond this point, the discretisation error is balanced by the decrease in variance, and we observe 
no further gain in performance. Nonetheless, despite the fact that the MALA scheme has no bias, the 
nonreversible sampler, with a = 5, outperforms MALA (in terms of MSE) by a signihcant factor of 8.8. 

We repeat this numerical experiment for the target distribution tt given by a standard Gaussian dis¬ 
tribution in and observable j{x) = X 2 + x^. In this case 7r(/) is exacty 0. We use the linear diffusion 
specified by (401 where the antisymmetric matrix J is given in (|51|) , which is optimal for this observable. 


We plot the (absolute) MSE for the estimator nT{f) in Figure ^alAVhile the smallest MSE is attained by 
the nonreversible Langevin sampler, when a = 25, the increase in performance is only marginal. This is 
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due to the fact that increasingly smaller timesteps must be taken to ensure that the EM approximation 
does not blow up. Indeed, the a = 25 sampler would not converge to a finite value for At greater than 
10“^, while the reversible sampler (a = 0) and the MALA scheme were accurate even for timesteps of 
order 1. 


From both examples it is clear that managing the numerical stability and discretisation error of the 
skew-symmetric drift term is essential for any practical implementation of the nonreversible sampling 
scheme. This suggests that using higher order and/or more stable numerical integrators to compute long 
time averages would be beneficial to eliminate the bias arising from discretisation, as well as permit 
the use of larger timesteps. Naturally, such schemes would require multiple evaluations of VlogTr at 
each step, thus it is possible that the additional computational cost offsets any performance gain. In the 
remainder of this section, we will perform the same numerical experiments using an integrator based 
on a Strang splitting 35 67 of the stochastic reversible and deterministic nonreversible dynamics. The 


reversible part will be simulated using a standard MALA scheme and the nonreversible flow using an 
appropriate higher-order integrator. Indeed, denote by <Pr,t the evolution of the reversible SDE ([^ from 
time 0 to t, and let denote the flow map corresponding to the ODE: 

z{t) ='y{z{t)), z{0)=x. 

We shall consider an integrator based on the following map from time t to t At: 


'I'At — ^r,Atl2 O ^n,At O ’l’r,At/2- 

For this implementation, we approximate ^r,At{x) using a single step of a MALA scheme with proposal 
based on (1^ with stepsize At. The nonreversible flow <Pn,At is approximated using a fourth-order Runge- 
Kutta method. 


We leave the justification and analysis of this scheme as the goal of future work, and in this paper 
simply use it to compute a long time average approximation to 7r(/) and compare the MSE with that of 
a corresponding reversible MALA scheme. To obtain a fair comparison between the results obtained by 
MALA and the splitting scheme, we note that while a careful implementation of MALA requires only one 
evaluation of VlogTr per timestep, the splitting scheme requires six evaluations of VlogTr per timestep 
(naively a single timestep would require 2 evaluations for each reversible substep and 4 evaluations for 
the nonreversible substep, however we can reuse two evaluations of V log tt between the steps). Thus, we 
shall compare the MSE obtained from trajectories of 10® timesteps of the nonreversible sampler with 
6 • 10® timesteps of the corresponding MALA scheme, for stepsizes ranging from 10“® to 1. The results 
for the warped Gaussian distribution in and standard Gaussian in are plotted in Eigures 7b and 
[8bl respectively. Note that we omit the a = 0 case since, in this case, the splitting scheme reduces to 
standard the MALA scheme. We observe that with this splitting scheme, the nonreversible sampler out¬ 
performs MALA by a factor of 13 for the warped Gaussian model, and by a factor of 20 for the standard 
Gaussian model. The benefits of the splitting scheme appear to be twofold: firstly the integrator is more 
stable, in both models, the long time simulation of A/ did not blow up, even for large values of a, and 
for At = 0.1. Moreover, compared to the corresponding Euler-Maruyama discretisation, the MSE is 
consistently an order of magnitude less. 


While this splitting scheme is only a first step into properly investigating appropriate integrators for 
nonreversible Langevin schemes, the above numerical experiments demonstrate clearly that there is a 
significant benefit in doing so, which motivates future investigation. 


7 Conclusions and Further Work 

In this paper we have presented a detailed analytical and numerical study of the effect of nonreversible 
perturbations to Langevin samplers. In particular, we have focused on the effect on the asymptotic vari¬ 
ance of adding a nonreversible drift to the overdamped Langevin dynamics. Our theoretical analysis, 
presented for diffusions with periodic coefficients and for diffusions with linear drift for which a com¬ 
plete analytical study can be performed, and our numerical investigations on toy models clearly show 
that a judicious choice of the nonreversible drift can lead to a substantial reduction in the asymptotic 
variance. On the other hand, as observed from the dimer model example in Section |5.4[ an arbitrary 
choice of nonreversible drift will not always give rise to significant improvement. We have also presented 
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(a) Relative MSE of the Euler-Maruyama discretisation. 


(b) Relative MSE of the splitting scheme. 


Fig. 7: Plot of the relative mean-square error as a function of the timestep size At for the observable 
f{x) = \x\^ of a warped Gaussian distribution defined by (571, with a fixed computational budget of 10® 
evaluations of V log tt for Figure 7a and 6-10® for Figure 
for the MALA scheme. 
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The dotted line denotes the relative MSE 
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(a) MSE for the Euler-Maruyama discretisation. 


(b) MSE for the splitting scheme. 


Fig. 8: Plot of the mean-square error as a function of the timestep size At for a linear observable of 
a Gaussian distribution, so that is a linear diffusion process. A fixed computational budget of 10® 
evaluations of Vlogir is imposed in Figure 8a and 6 • 10® in Figure 8b The dotted line depicts the 


corresponding error for the Metropolized sampler. 


a careful study of the computational cost of the algorithm based on a nonreversible Langevin sampler, 
in which the competing effects of reducing the asymptotic variance and of increasing the stiffness due to 
the addition of a nonreversible drift are monitored. The main conclusions that can be drawn from our 
numerical experiments is that a nonreversible Langevin sampler with close-to-the-optimal choice of the 
nonreversible drift significantly outperforms the (reversible) Metropolis-Hastings sampler. 

There are many open problems that need to be studied further: 

1. The effect of using degenerate, hypoelliptic diffusions for sampling from a given distribution. 

2. Gombining the use of nonreversible Langevin samplers with standard variance reduction techniques 
such as the zero variance reduction MGMC methodology [l4] . 

3. Optimizing Langevin samplers within the class of reversible diffusions. 

4. The development of nonreversible Metropolis-Hastings algorithms based on the above techniques, 
possibly related to approach described in [^. 
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5. The development and analysis of numerical schemes specifically designed to simulate nonreversible 
Langevin diffusions. 

All these topics are currently under investigation. 
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